37 research outputs found
A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using - Smoothness
Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009) are
the first ( is the number features) algorithms that have convergence
guarantees for off-policy learning with linear function approximation. Liu et
al. (2015) and Dalal et. al. (2018) proved the convergence rates of GTD, GTD2
and TDC are for some . This bound is tight
(Dalal et al., 2020), and slower than . GTD algorithms also have
two step-size parameters, which are difficult to tune. In literature, there is
a "single-time-scale" formulation of GTD. However, this formulation still has
two step-size parameters.
This paper presents a truly single-time-scale GTD algorithm for minimizing
the Norm of Expected td Update (NEU) objective, and it has only one step-size
parameter. We prove that the new algorithm, called Impression GTD, converges at
least as fast as . Furthermore, based on a generalization of the
expected smoothness (Gower et al. 2019), called - smoothness, we
are able to prove that the new GTD converges even faster, in fact, with a
linear rate. Our rate actually also improves Gower et al.'s result with a
tighter bound under a weaker assumption. Besides Impression GTD, we also prove
the rates of three other GTD algorithms, one by Yao and Liu (2008), another
called A-transpose-TD (Sutton et al., 2008), and a counterpart of
A-transpose-TD. The convergence rates of all the four GTD algorithms are proved
in a single generic GTD framework to which - smoothness applies.
Empirical results on Random walks, Boyan chain, and Baird counterexample show
that Impression GTD converges much faster than existing GTD algorithms for both
on-policy and off-policy learning problems, with well-performing step-sizes in
a big range
Baird Counterexample is Solved: with an example of How to Debug a Two-time-scale Algorithm
Baird counterexample was proposed by Leemon Baird in 1995, first used to show
that the Temporal Difference (TD(0)) algorithm diverges on this example. Since
then, it is often used to test and compare off-policy learning algorithms.
Gradient TD algorithms solved the divergence issue of TD on Baird
counterexample. However, their convergence on this example is still very slow,
and the nature of the slowness is not well understood, e.g., see (Sutton and
Barto 2018).
This note is to understand in particular, why TDC is slow on this example,
and provide a debugging analysis to understand this behavior. Our debugging
technique can be used to study the convergence behavior of two-time-scale
stochastic approximation algorithms. We also provide empirical results of the
recent Impression GTD algorithm on this example, showing the convergence is
very fast, in fact, in a linear rate. We conclude that Baird counterexample is
solved, by an algorithm with the convergence guarantee to the TD solution in
general, and a fast convergence rate
Class Interference of Deep Neural Networks
Recognizing and telling similar objects apart is even hard for human beings.
In this paper, we show that there is a phenomenon of class interference with
all deep neural networks. Class interference represents the learning difficulty
in data, and it constitutes the largest percentage of generalization errors by
deep networks. To understand class interference, we propose cross-class tests,
class ego directions and interference models. We show how to use these
definitions to study minima flatness and class interference of a trained model.
We also show how to detect class interference during training through label
dancing pattern and class dancing notes